Data Introduction

Parks and recreation was a television comedy show that aired on NBC from 2009 until 2015. I obtained the complete transcripts and performed text analysis on the dialogue of the show.

Citation for dataset: He, Luke. (2019, November 23) Park and Recreation Scripts. Link to data.

file_names <- list.files(here("scripts")) # file names for each episode

parks <- str_glue("scripts/{file_names}") %>% 
  map_dfr(read_csv) # read in all the episodes into one data frame!

# Tokenize lines to one word in each row
parks_token <- parks %>% 
  clean_names() %>% 
  unnest_tokens(word, line) %>% # tokenize
  anti_join(stop_words) %>% # remove stop words
  mutate(word = str_extract(word, "[a-z']+")) %>% # extract words only
  drop_na(word) # take out missing values

# Filter the top 10 characters with the most words
top_characters <- parks_token %>%
  dplyr::filter(character != "Extra") %>% 
  count(character, sort = TRUE) %>%
  slice_max(n, n = 10) 

# Obtain words only from the top 10 characters
parks_words <- parks_token %>% 
  inner_join(top_characters) %>% 
  filter(!word %in% c("hey", "yeah", "gonna")) %>% 
  select(-n) %>% 
  count(word, character, sort = TRUE) %>% 
  ungroup() %>% 
  group_by(character) %>% 
  top_n(9) # top 9

# Sample of a few lines from the show
parks %>% 
  slice(sample(1:65942, 20)) %>% 
  kbl(caption = "<b style = 'color:white;'>
       Sample of a few randomly chosen lines from Parks and Recreation.") %>%
  kable_material_dark(bootstrap_options = c("striped", "hover")) %>%
  row_spec(0, color = "white", background = "#222222") %>%
  scroll_box(width = "100%", height = "300px", 
             fixed_thead = list(enabled = T, background = "#222222"))
Sample of a few randomly chosen lines from Parks and Recreation.
Character Line
Tom Haverford Man, I really got things figured out.
Jerry Gergich As you may know, I do like to tinker with things in my garage.
April Ludgate You’re unpredictable, complex, and hard to read.
Extra You’ve been served.
Joe You’re welcome.
Tom Haverford " Everything you just said makes me like me more.
Tom Haverford 600
Leslie Knope So, you don’t have any idea what you want to do?
Ron Swanson I’d love for you to stick around, Tommy.
Leslie Knope And then it gets a little unpleasant.
Leslie Knope Oh!
Leslie Knope The best way to be safe is to simply postpone sex until marriage.
Joan Callamezzo Looking at you, Leslie.
Music Yes, I pray that you do love me too
Ian Winston Leslie said I should come over here.
Leslie Knope But I’m just gonna spend the remainder of my term cramming in as many good projects as I can.
Tom Haverford The four sweetest words in the English language, “You wore me down.”
Leslie Knope His mom’s gonna be here soon.
Tom Haverford Oh, no.
Ann Perkins But in the meantime, I’m just gonna tell you in English.

Word Count of Major Characters

It’s difficult to choose a favorite character from Parks and Rec, thus I plotted the top 9 most frequently used words from ten characters. Some examples of words that would resonate with fans of the show are Chris Traeger’s literally, Jerry (Gary) Gergich’s geez, or Ben Wyatt’s uh.

ggplot(data = parks_words, 
       aes(x = n, y = word, fill = n)) +
  geom_col(show.legend = FALSE) +
  scale_fill_viridis_c(option = "plasma") +
  facet_wrap(~character, scales = "free") +
  theme_brooklyn99() +
  theme(panel.grid.major.y = element_blank(),
        axis.text.x = element_text(size = 9),
        axis.text.y = element_text(size = 9.5),
        axis.title = element_blank(),
        panel.grid.minor = element_blank(),
        strip.text = element_text(color = "white",
                                  face = "bold",
                                  size = 10.5))

Wordcloud

Below are four wordclouds of the 25 most frequently used words by the following characters starting from the upper left hand corner going clockwise: Andy Dwyer, April Ludgate, Ron Swanson, and Leslie Knope. We can see Andy Dwyer’s enthusiasm with karate and band, Leslie Knope’s love for pawnee, city, and parks, but also Ron Swanson’s contempt for government and his 2 ex-wives both named tammy.

# Ron Swanson

swanson_words <- parks_token %>% 
  filter(character == "Ron Swanson") %>% # filter for character
  filter(!word %in% c("hey", "yeah", "gonna")) %>% # remove some more stopwords
  count(word) %>% 
  slice_max(n,n = 25) # choose top 25 words
  
swanson_pic <- jpeg::readJPEG(here("images","ron_swanson.jpg")) 

swanson_cloud <- ggplot(data = swanson_words,
                        aes(label = word)) +
  background_image(swanson_pic) + # add image of character
  geom_text_wordcloud(aes(size = n), 
                      color = "turquoise1",
                      shape = "circle") +
  scale_size_area(max_size = 6) +
  theme_void()

# Lesile Knope

knope_words <- parks_token %>% 
  filter(character == "Leslie Knope") %>% 
  filter(!word %in% c("hey", "yeah", "gonna")) %>% # remove some more stopwords
  count(word) %>% 
  slice_max(n,n = 25)
  
knope_pic <- jpeg::readJPEG(here("images", "knope.jpg"))

knope_cloud <- ggplot(data = knope_words,
                        aes(label = word)) +
  background_image(knope_pic) +
  geom_text_wordcloud(aes(size = n), 
                      color = "turquoise1",
                      shape = "star") +
  scale_size_area(max_size = 6) +
  theme_void()

# April Ludgate

april_words <- parks_token %>% 
  filter(character == "April Ludgate") %>% 
  filter(!word %in% c("hey", "yeah", "gonna")) %>% # remove some more stopwords
  count(word) %>% 
  slice_max(n,n = 25)
  
april_pic <- jpeg::readJPEG(here("images", "april.jpeg"))

april_cloud <- ggplot(data = april_words,
                        aes(label = word)) +
  background_image(april_pic) +
  geom_text_wordcloud(aes(size = n), 
                      color = "turquoise1",
                      shape = "triangle-upright") +
  scale_size_area(max_size = 6) +
  theme_void()

# Andy Dwyer

andy_words <- parks_token %>% 
  filter(character == "Andy Dwyer") %>% 
  filter(!word %in% c("hey", "yeah", "gonna")) %>% # remove some more stopwords
  count(word) %>% 
  slice_max(n,n = 25)
  
andy_pic <- jpeg::readJPEG(here("images", "andy.jpg"))

andy_cloud <- ggplot(data = andy_words,
                        aes(label = word)) +
  background_image(andy_pic) +
  geom_text_wordcloud(aes(size = n), 
                      color = "turquoise1",
                      shape = "diamond") +
  scale_size_area(max_size = 6) +
  theme_void()

# Final patcwork wordcloud

patchwork <-  (andy_cloud + april_cloud) / (knope_cloud + swanson_cloud) 

patchwork & theme(plot.background = element_rect(fill = "#222222",
                                                 color = "#222222"),
                  strip.background = element_rect(fill = "#222222",
                                                 color = "#222222"))

Character Sentimemnt Analysis

Using the nrc lexicon, which bins 13,901 words into 8 emotions, along with giving them a positive or negative rating, I plotted the counts of each sentiment for ten characters. We see that all the characters shown here use more positive words, and they all used words associated with trust and anticipation.

Citation for NRC lexicon: Crowdsourcing a Word-Emotion Association Lexicon, Saif Mohammad and Peter Turney, Computational Intelligence, 29 (3), 436-465, 2013. nrc lexicon

characters_sent <-  parks_token %>%
  inner_join(top_characters) %>%
  filter(!word %in% c("hey", "yeah", "gonna")) %>%
  select(-n) %>% 
  inner_join(get_sentiments("nrc")) %>% 
  count(sentiment, character, sort = TRUE)

ggplot(data = characters_sent, 
       aes(x = n, y = sentiment, fill = n)) +
  geom_col(show.legend = FALSE) +
  scale_fill_viridis_c(option = "plasma") +
  facet_wrap(~character, scales = "free") +
  theme_brooklyn99() +
  theme(panel.grid.major.y = element_blank(),
        axis.text.x = element_text(size = 7.5),
        axis.text.y = element_text(size = 9.5),
        axis.title = element_blank(),
        panel.grid.minor = element_blank(),
        strip.text = element_text(color = "white",
                                  face = "bold",
                                  size = 9.3))

Trajectory of Sentiment

Parks and Recreation is a hilarious comedy show with many enjoyable characters. Thus, it’s no surprise that for most of the show the average sentiment is more positive. Using the AFINN lexicon, which assigns words a score between -5 (negative sentiment) and 5 (positive sentiment), I obtained the moving average with a window size of 151, and plotted the moving average sentiment throughout the entirety of the show.

Citation for AFINN lexicon: AFINN, Nielson, Finn Årup. Informatics and Mathematical Modelling, Technical University of Denmark. March 2011. AFINN lexicon

parks_afinn <- parks_token %>% 
  inner_join(get_sentiments("afinn")) %>%
  drop_na(value) %>% 
  mutate(index = seq(1, length(word) ,1)) %>% # make an index
  mutate(moving_avg = as.numeric(slide(value, # get moving average
                                       mean, 
                                       .before = (151 - 1)/2 , 
                                       .after = (151 - 1)/2 ))) %>% 
  mutate(neg_pos = factor(case_when(
    moving_avg > 0 ~ "Positive",
    moving_avg <= 0 ~ "Negative"
  )))

sent_plot <- ggplot(data = parks_afinn, aes(x = index, y = moving_avg)) +
  geom_col(aes(fill = neg_pos)) +
  scale_fill_manual(values = c("Positive" = "springgreen2",
                               "Negative" = "darkred"))+
  theme_minimal() +
  labs(x = "Index",
       y = "Moving Average AFINN Sentiment",
       fill = "") +
  theme(panel.grid.minor.y = element_blank(),
        panel.grid.major.x = element_blank(),
        axis.text.x = element_blank(),
        axis.text.y = element_text(size = 11,
                                   face = "bold",
                                   color = "white"),
        axis.title.y = element_text(color = "white",
                                  size = 12,
                                  face = "bold"),
        axis.title.x = element_blank(),
        panel.grid.minor = element_blank(),
        plot.background = element_rect(fill = "#222222", 
                                       color = "#222222"),
        strip.background = element_rect(fill = "#222222", 
                                        color = "#222222"),
        legend.text = element_text(color = "white",
                                  size = 11,
                                  face = "bold"))

sent_plot

Sentiment Anaylsis of Season 4

I decided to take a closer look at the sentiment throughout season 4 since this was one of the more popular seasons, where Leslie Knope is campaigning to be a member of the city council of Pawnee, Indiana. Here I used a moving average window of 51 to plot the AFINN sentiment value. We see that for most of the season the overall average sentiment is positive, except for a noticeable drop near the end of the season where the sentiment score falls around -1.

file_names_season <- str_sub(file_names, start = 3L)

# used this line of code to easily find the episode number of each season
# which(file_names_season == "e01.csv")

season_4 <- str_glue("scripts/{file_names[47:68]}") %>% 
  map_dfr(read_csv)

# Tokenize lines to one word in each row
season_token <-  season_4 %>% 
  clean_names() %>% 
  unnest_tokens(word, line) %>% # tokenize
  anti_join(stop_words) %>% # remove stop words
  mutate(word = str_extract(word, "[a-z']+")) %>% # extract words only
  drop_na(word) # take out missing values

season_afinn <- season_token %>% 
  inner_join(get_sentiments("afinn")) %>%
  drop_na(value) %>% 
  mutate(index = seq(1, length(word) ,1)) %>% 
  mutate(moving_avg = as.numeric(slide(value,
                                       mean, 
                                       .before = (51 - 1)/2 , 
                                       .after = (51 - 1)/2 ))) 


season_plot <- ggplot(data = season_afinn, aes(x = index, y = moving_avg)) +
  geom_col(aes(fill = moving_avg)) +
  # scale_fill_distiller(type = "div",
  #                      palette = "GnPR")+
  scale_fill_carto_c(type = "diverging",
                     palette = "Earth") +
  theme_minimal() +
  labs(x = "Index",
       y = "Moving Average AFINN Sentiment",
       fill = "") +
  theme(panel.grid.minor.y = element_blank(),
        panel.grid.major.x = element_blank(),
        axis.text.x = element_blank(),
        axis.text.y = element_text(size = 11,
                                   face = "bold",
                                   color = "white"),
        axis.title.y = element_text(color = "white",
                                  size = 12,
                                  face = "bold"),
        axis.title.x = element_blank(),
        panel.grid.minor = element_blank(),
        plot.background = element_rect(fill = "#222222", 
                                       color = "#222222"),
        strip.background = element_rect(fill = "#222222", 
                                        color = "#222222"),
        legend.text = element_text(color = "white",
                                  size = 11,
                                  face = "bold"))

season_plot

Digging into the data I found that this occurred during the penultimate episode of the season named “Bus Tour”. The episode starts with Lesile Knope behind in polls to her opponent in the city council race, Bobby Newport. During one of her campaign stops, in response to a question by a reporter, Lesile starts saying disparaging things about Bobby’s father. After she is finished, the reporter informs Leslie her question was about if she had any comments about his death earlier in the day. Meanwhile, in order to get people to the polls, Lesile’s team trys to secure vans to transport possible voters. But Bobby Newport’s team has secured all the vans in the city. Thus, most of the episode is spent trying to do damage control for Lesile and her campaign team’s mishaps. Below are the words that have AFINN ratings during this dip in sentiment in season 4.

# Investigate the negative dip of the plot
season_afinn_neg <- season_afinn %>% 
  filter(moving_avg < -0.75) %>% 
  slice(-c(1:2)) %>% 
  select(-index) %>% 
  rename('moving average' = moving_avg)

# How I figured out which episode it was
season_4_subset <- season_4 %>% 
  filter(Character == "Bill") 

# Table of words
season_afinn_neg %>% 
  kbl(caption = "<b style = 'color:white;'>
       What was happening towards the end of season 4 of Park and Recreation when things went south?") %>%
  kable_material_dark(bootstrap_options = c("striped", "hover")) %>%
  row_spec(0, color = "white", background = "#222222") %>%
  scroll_box(width = "100%", height = "300px", 
             fixed_thead = list(enabled = T, background = "#222222"))
What was happening towards the end of season 4 of Park and Recreation when things went south?
character word value moving average
Bill grand 3 -0.7647059
Tom Haverford demands -1 -0.7647059
Tom Haverford crying -2 -0.8431373
Leslie Knope promise 1 -0.8235294
Leslie Knope stop -1 -0.8823529
Leslie Knope intimidating -2 -0.8823529
Leslie Knope bullying -2 -0.9019608
Leslie Knope jerk -3 -0.9019608
Leslie Knope wrong -2 -0.9019608
Leslie Knope died -3 -0.8627451
Leslie Knope sad -2 -0.9215686
Extra sad -2 -0.9215686
Leslie Knope bummer -2 -0.8039216
Leslie Knope jerk -3 -0.7647059
Perd Hapley love 3 -0.7647059
Jennifer Barkley cancel -1 -0.7843137
Leslie Knope emergency -2 -0.8235294
Leslie Knope trust 1 -0.8431373
Leslie Knope died -3 -0.9019608
Leslie Knope awful -3 -0.8823529
Leslie Knope died -3 -0.7843137
Ann Perkins dead -3 -0.7843137
Ann Perkins jerk -3 -0.8823529
Leslie Knope jerk -3 -0.9411765
Leslie Knope polluted -2 -0.9803922
Ben Wyatt stop -1 -1.0980392
Ben Wyatt stop -1 -1.1372549
Ann Perkins fine 2 -1.1568627
Ann Perkins stop -1 -1.1960784
Ann Perkins apologize -1 -1.1960784
Chris Traeger worst -3 -1.1764706
Chris Traeger stop -1 -1.0980392
Chris Traeger stops -1 -1.0588235
Chris Traeger stop -1 -1.0392157
Chris Traeger stopping -1 -0.9607843
Chris Traeger death -2 -0.8627451
Leslie Knope beautiful 3 -0.8431373
Leslie Knope classy 3 -0.8235294
Donna Meagle free 1 -0.8235294
Donna Meagle huge 1 -0.7647059
Bill yeah 1 -0.8627451
Bill hell -4 -0.8039216
Bill free 1 -0.8039216
Bill pay -1 -0.7843137